25 research outputs found
An improved Belief Propagation algorithm finds many Bethe states in the random field Ising model on random graphs
We first present an empirical study of the Belief Propagation (BP) algorithm,
when run on the random field Ising model defined on random regular graphs in
the zero temperature limit. We introduce the notion of maximal solutions for
the BP equations and we use them to fix a fraction of spins in their ground
state configuration. At the phase transition point the fraction of
unconstrained spins percolates and their number diverges with the system size.
This in turn makes the associated optimization problem highly non trivial in
the critical region. Using the bounds on the BP messages provided by the
maximal solutions we design a new and very easy to implement BP scheme which is
able to output a large number of stable fixed points. On one side this new
algorithm is able to provide the minimum energy configuration with high
probability in a competitive time. On the other side we found that the number
of fixed points of the BP algorithm grows with the system size in the critical
region. This unexpected feature poses new relevant questions on the physics of
this class of models.Comment: 20 pages, 8 figure
Deep learning via message passing algorithms based on belief propagation
Message-passing algorithms based on the belief propagation (BP) equations constitute a
well-known distributed computational scheme. They yield exact marginals on tree-like graphical
models and have also proven to be effective in many problems defined on loopy graphs, from
inference to optimization, from signal processing to clustering. The BP-based schemes are
fundamentally different from stochastic gradient descent (SGD), on which the current success of
deep networks is based. In this paper, we present and adapt to mini-batch training on GPUs a
family of BP-based message-passing algorithms with a reinforcement term that biases distributions
towards locally entropic solutions. These algorithms are capable of training multi-layer neural
networks with performance comparable to SGD heuristics in a diverse set of experiments on
natural datasets including multi-class image classification and continual learning, while being
capable of yielding improved performances on sparse networks. Furthermore, they allow to make
approximate Bayesian predictions that have higher accuracy than point-wise ones
Typical and atypical solutions in non-convex neural networks with discrete and continuous weights
We study the binary and continuous negative-margin perceptrons as simple
non-convex neural network models learning random rules and associations. We
analyze the geometry of the landscape of solutions in both models and find
important similarities and differences. Both models exhibit subdominant
minimizers which are extremely flat and wide. These minimizers coexist with a
background of dominant solutions which are composed by an exponential number of
algorithmically inaccessible small clusters for the binary case (the frozen
1-RSB phase) or a hierarchical structure of clusters of different sizes for the
spherical case (the full RSB phase). In both cases, when a certain threshold in
constraint density is crossed, the local entropy of the wide flat minima
becomes non-monotonic, indicating a break-up of the space of robust solutions
into disconnected components. This has a strong impact on the behavior of
algorithms in binary models, which cannot access the remaining isolated
clusters. For the spherical case the behaviour is different, since even beyond
the disappearance of the wide flat minima the remaining solutions are shown to
always be surrounded by a large number of other solutions at any distance, up
to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find
solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB
approximation. For both models, the generalization performance as a learning
device is shown to be greatly improved by the existence of wide flat minimizers
even when trained in the highly underconstrained regime of very negative
margins.Comment: 34 pages, 13 figure
The Hidden-Manifold Hopfield Model and a learning phase transition
The Hopfield model has a long-standing tradition in statistical physics,
being one of the few neural networks for which a theory is available. Extending
the theory of Hopfield models for correlated data could help understand the
success of deep neural networks, for instance describing how they extract
features from data. Motivated by this, we propose and investigate a generalized
Hopfield model that we name Hidden-Manifold Hopfield Model: we generate the
couplings from examples with the Hebb rule using a non-linear
transformation of random vectors that we call factors, with
the number of neurons. Using the replica method, we obtain a phase diagram for
the model that shows a phase transition where the factors hidden in the
examples become attractors of the dynamics; this phase exists above a critical
value of and below a critical value of . We call this
behaviour learning transition
Storage and Learning phase transitions in the Random-Features Hopfield Model
The Hopfield model is a paradigmatic model of neural networks that has been
analyzed for many decades in the statistical physics, neuroscience, and machine
learning communities. Inspired by the manifold hypothesis in machine learning,
we propose and investigate a generalization of the standard setting that we
name Random-Features Hopfield Model. Here binary patterns of length are
generated by applying to Gaussian vectors sampled in a latent space of
dimension a random projection followed by a non-linearity. Using the
replica method from statistical physics, we derive the phase diagram of the
model in the limit with fixed ratios and
. Besides the usual retrieval phase, where the patterns can be
dynamically recovered from some initial corruption, we uncover a new phase
where the features characterizing the projection can be recovered instead. We
call this phenomena the learning phase transition, as the features are not
explicitly given to the model but rather are inferred from the patterns in an
unsupervised fashion
Entropic gradient descent algorithms and wide flat minima
The properties of flat minima in the empirical risk landscape of neural
networks have been debated for some time. Increasing evidence suggests they
possess better generalization capabilities with respect to sharp ones. First,
we discuss Gaussian mixture classification models and show analytically that
there exist Bayes optimal pointwise estimators which correspond to minimizers
belonging to wide flat regions. These estimators can be found by applying
maximum flatness algorithms either directly on the classifier (which is norm
independent) or on the differentiable loss function used in learning. Next, we
extend the analysis to the deep learning scenario by extensive numerical
validations. Using two algorithms, Entropy-SGD and Replicated-SGD, that
explicitly include in the optimization objective a non-local flatness measure
known as local entropy, we consistently improve the generalization error for
common architectures (e.g. ResNet, EfficientNet). An easy to compute flatness
measure shows a clear correlation with test accuracy.Comment: updated version focusing on numerical experiment
The star-shaped space of solutions of the spherical negative perceptron
Empirical studies on the landscape of neural networks have shown that
low-energy configurations are often found in complex connected structures,
where zero-energy paths between pairs of distant solutions can be constructed.
Here we consider the spherical negative perceptron, a prototypical non-convex
neural network model framed as a continuous constraint satisfaction problem. We
introduce a general analytical method for computing energy barriers in the
simplex with vertex configurations sampled from the equilibrium. We find that
in the over-parameterized regime the solution manifold displays simple
connectivity properties. There exists a large geodesically convex component
that is attractive for a wide range of optimization dynamics. Inside this
region we identify a subset of atypical high-margin solutions that are
geodesically connected with most other solutions, giving rise to a star-shaped
geometry. We analytically characterize the organization of the connected space
of solutions and show numerical evidence of a transition, at larger constraint
densities, where the aforementioned simple geodesic connectivity breaks down.Comment: 27 pages, 16 figures, comments are welcom
Consensus guidelines for the use and interpretation of angiogenesis assays
The formation of new blood vessels, or angiogenesis, is a complex process that plays important roles in growth and development, tissue and organ regeneration, as well as numerous pathological conditions. Angiogenesis undergoes multiple discrete steps that can be individually evaluated and quantified by a large number of bioassays. These independent assessments hold advantages but also have limitations. This article describes in vivo, ex vivo, and in vitro bioassays that are available for the evaluation of angiogenesis and highlights critical aspects that are relevant for their execution and proper interpretation. As such, this collaborative work is the first edition of consensus guidelines on angiogenesis bioassays to serve for current and future reference
Unveiling the structure of wide flat minima in neural networks
The success of deep learning has revealed the application potential of neural
networks across the sciences and opened up fundamental theoretical problems. In
particular, the fact that learning algorithms based on simple variants of
gradient methods are able to find near-optimal minima of highly nonconvex loss
functions is an unexpected feature of neural networks. Moreover, such
algorithms are able to fit the data even in the presence of noise, and yet they
have excellent predictive capabilities. Several empirical results have shown a
reproducible correlation between the so-called flatness of the minima achieved
by the algorithms and the generalization performance. At the same time,
statistical physics results have shown that in nonconvex networks a multitude
of narrow minima may coexist with a much smaller number of wide flat minima,
which generalize well. Here we show that wide flat minima arise as complex
extensive structures, from the coalescence of minima around "high-margin"
(i.e., locally robust) configurations. Despite being exponentially rare
compared to zero-margin ones, high-margin minima tend to concentrate in
particular regions. These minima are in turn surrounded by other solutions of
smaller and smaller margin, leading to dense regions of solutions over long
distances. Our analysis also provides an alternative analytical method for
estimating when flat minima appear and when algorithms begin to find solutions,
as the number of model parameters varies.Comment: 15 pages, 8 figure